ARCH Models (Volatility Models)#

Goals#

  • Precisely define conditional heteroskedasticity.

  • Explain why we model variance (volatility) instead of the mean for many financial return series.

  • Build ARCH(q) variance equations in LaTeX and implement them from scratch in NumPy.

  • Visualize volatility clustering and variance forecasts with Plotly.

Conditional heteroskedasticity (definition)#

Let \(r_t\) be a return (or mean-zero residual) and \(\mathcal{F}_{t-1}\) the information available up to time \(t-1\).

A standard decomposition is:

\[ r_t = \mu + \varepsilon_t, \qquad \mathbb{E}[\varepsilon_t\mid\mathcal{F}_{t-1}] = 0, \qquad \operatorname{Var}(\varepsilon_t\mid\mathcal{F}_{t-1}) = h_t. \]
  • Homoskedastic models assume \(h_t = \sigma^2\) is constant.

  • Conditionally heteroskedastic models assume \(h_t\) is time-varying and depends on past information \(\mathcal{F}_{t-1}\).

ARCH/GARCH models explicitly specify a recursion for \(h_t\) (the conditional variance).

Why model variance (volatility) instead of the mean?#

For many liquid financial assets at daily (or higher) frequency:

  • The conditional mean \(\mathbb{E}[r_t\mid\mathcal{F}_{t-1}]\) is often small and hard to predict reliably.

  • The conditional variance \(h_t\) shows strong, stable structure (persistence / clustering), making it forecastable.

  • Many decisions depend more on risk than drift: even if the expected return is near 0, the uncertainty is not.

Financial use cases#

  • Risk management: Value-at-Risk (VaR), Expected Shortfall, stress testing, margin.

  • Portfolio construction: volatility targeting, risk parity, dynamic leverage.

  • Derivatives: volatility inputs/forecasts (and a bridge to option-implied volatility).

  • Trading & execution: position sizing, stop placement, liquidity/impact models.

Volatility clustering + market-shock intuition#

Volatility clustering: large moves tend to be followed by large moves (of either sign), and small moves tend to be followed by small moves.

A simple shock story:

  • Suppose a big negative news event hits at time \(t\).

  • The return shock \(\varepsilon_t\) is large in magnitude, so \(\varepsilon_t^2\) is large.

  • In ARCH models, the next conditional variance \(h_{t+1}\) increases because it loads on past squared shocks.

  • With higher \(h_{t+1}\), the next innovation \(\varepsilon_{t+1}\) is more likely to be large in magnitude, so clusters form.

ARCH(q) model (LaTeX)#

ARCH models specify the conditional variance as a weighted sum of recent squared innovations.

\[ \varepsilon_t = z_t\sqrt{h_t}, \qquad z_t\ \text{i.i.d. with}\ \mathbb{E}[z_t]=0,\ \operatorname{Var}(z_t)=1 \]
\[ h_t = \omega + \sum_{i=1}^{q}\alpha_i\varepsilon_{t-i}^2 \]

Parameter constraints (to keep \(h_t\) positive)#

  • \(\omega > 0\)

  • \(\alpha_i \ge 0\) for all \(i\)

Stationarity / finite unconditional variance (common condition)#

A typical second-order stationarity condition is:

\[ \sum_{i=1}^{q}\alpha_i < 1 \]

which implies the long-run (unconditional) variance:

\[ \mathbb{E}[h_t] = \operatorname{Var}(\varepsilon_t) = \frac{\omega}{1-\sum_{i=1}^{q}\alpha_i}. \]

Variance equation breakdown (what each term does)#

\[ h_t = \underbrace{\omega}_{\text{baseline variance}} + \sum_{i=1}^{q}\underbrace{\alpha_i}_{\text{news/shock weight}}\underbrace{\varepsilon_{t-i}^2}_{\text{recent shock size}} \]
  • \(\omega\) sets the baseline scale (and pins down the long-run variance when combined with \(\alpha\)’s).

  • \(\alpha_i\) controls how strongly a past shock affects current volatility.

  • Squaring removes the sign: both positive and negative shocks increase variance.

import numpy as np
import pandas as pd

import plotly.graph_objects as go
from plotly.subplots import make_subplots
import os
import plotly.io as pio

pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")
pio.templates.default = "plotly_white"

np.set_printoptions(precision=4, suppress=True)
def simulate_arch(T: int, omega: float, alpha, mu: float = 0.0, burn: int = 300, seed: int = 7):
    """Simulate an ARCH(q) process with Gaussian innovations.

    Model:
      r_t = mu + eps_t
      eps_t = z_t * sqrt(h_t)
      h_t = omega + sum_{i=1..q} alpha_i * eps_{t-i}^2
    """
    alpha = np.asarray(alpha, dtype=float)
    q = int(alpha.size)
    if q < 1:
        raise ValueError("alpha must have at least one element (q>=1)")
    if omega <= 0:
        raise ValueError("omega must be > 0")
    if np.any(alpha < 0):
        raise ValueError("all alpha_i must be >= 0")

    alpha_sum = float(alpha.sum())
    if alpha_sum >= 1:
        raise ValueError("Need sum(alpha) < 1 for finite unconditional variance in this demo")

    h_bar = omega / (1.0 - alpha_sum)
    n = int(T + burn)

    rng = np.random.default_rng(seed)
    z = rng.standard_normal(n)

    eps = np.zeros(n)
    h = np.full(n, h_bar)

    eps[:q] = np.sqrt(h_bar) * z[:q]
    for t in range(q, n):
        # h_t = omega + sum_i alpha_i * eps_{t-i}^2
        lagged_eps_sq = eps[t - np.arange(1, q + 1)] ** 2
        h[t] = omega + float(np.dot(alpha, lagged_eps_sq))
        eps[t] = np.sqrt(h[t]) * z[t]

    eps = eps[burn:]
    h = h[burn:]
    r = mu + eps
    return r, eps, h


def arch_conditional_variance(eps, omega: float, alpha, initial_variance: float | None = None):
    """Compute h_t for an ARCH(q) model given a residual series eps_t."""
    eps = np.asarray(eps, dtype=float)
    alpha = np.asarray(alpha, dtype=float)
    q = int(alpha.size)
    if q < 1:
        raise ValueError("alpha must have at least one element (q>=1)")
    if omega <= 0:
        raise ValueError("omega must be > 0")
    if np.any(alpha < 0):
        raise ValueError("all alpha_i must be >= 0")

    if initial_variance is None:
        alpha_sum = float(alpha.sum())
        if alpha_sum >= 1:
            raise ValueError("Need sum(alpha) < 1 to use the unconditional variance as initialization")
        initial_variance = omega / (1.0 - alpha_sum)

    h = np.full(eps.size, float(initial_variance))
    for t in range(q, eps.size):
        lagged_eps_sq = eps[t - np.arange(1, q + 1)] ** 2
        h[t] = omega + float(np.dot(alpha, lagged_eps_sq))
    return h


def arch_forecast_variance(eps, omega: float, alpha, horizon: int):
    """Multi-step variance forecasts for ARCH(q).

    Uses E[eps_{t+k}^2 | F_t] = h_{t+k|t} (since Var(z)=1).
    """
    eps = np.asarray(eps, dtype=float)
    alpha = np.asarray(alpha, dtype=float)
    q = int(alpha.size)
    if eps.size < q:
        raise ValueError("Need at least q residuals to forecast")

    eps_sq_lags = eps[-np.arange(1, q + 1)] ** 2  # [eps_t^2, eps_{t-1}^2, ...]
    forecasts = np.zeros(int(horizon), dtype=float)
    for k in range(int(horizon)):
        h_next = omega + float(np.dot(alpha, eps_sq_lags))
        forecasts[k] = h_next
        eps_sq_lags = np.concatenate([[h_next], eps_sq_lags[:-1]])
    return forecasts
# Example: simulate an ARCH(2) return series to visualize volatility clustering
T = 2000
mu = 0.0
omega = 0.05
alpha = np.array([0.25, 0.15])  # sum < 1

r, eps, h = simulate_arch(T=T, omega=omega, alpha=alpha, mu=mu, seed=42)

view = 600
df = pd.DataFrame(
    {
        "t": np.arange(T),
        "return": r,
        "eps_sq": eps**2,
        "h": h,
        "sigma": np.sqrt(h),
    }
).tail(view)

fig = make_subplots(
    rows=2,
    cols=1,
    shared_xaxes=True,
    vertical_spacing=0.06,
    subplot_titles=("Simulated returns", "Volatility clustering: squared returns vs conditional variance"),
)

fig.add_trace(go.Scatter(x=df["t"], y=df["return"], mode="lines", name="r_t"), row=1, col=1)

fig.add_trace(
    go.Scatter(x=df["t"], y=df["eps_sq"], mode="lines", name=r"$\varepsilon_t^2$", line=dict(width=1)),
    row=2,
    col=1,
)
fig.add_trace(
    go.Scatter(x=df["t"], y=df["h"], mode="lines", name=r"$h_t$", line=dict(width=2)),
    row=2,
    col=1,
)

fig.update_yaxes(title_text="return", row=1, col=1)
fig.update_yaxes(title_text="variance", row=2, col=1)
fig.update_xaxes(title_text="time", row=2, col=1)
fig.update_layout(height=650, legend=dict(orientation="h", yanchor="bottom", y=1.02, xanchor="left", x=0))
fig.show()
# Variance forecasts: h_{T+k | T}
horizon = 60
h_fore = arch_forecast_variance(eps, omega=omega, alpha=alpha, horizon=horizon)

alpha_sum = float(alpha.sum())
h_bar = omega / (1.0 - alpha_sum)

lookback = 250
hist_x = np.arange(T - lookback, T)
fore_x = np.arange(T, T + horizon)

fig = go.Figure()
fig.add_trace(go.Scatter(x=hist_x, y=h[-lookback:], mode="lines", name=r"historical $h_t$"))
fig.add_trace(go.Scatter(x=fore_x, y=h_fore, mode="lines", name=r"forecast $h_{T+k|T}$"))
fig.add_trace(
    go.Scatter(
        x=np.concatenate([hist_x, fore_x]),
        y=np.full(hist_x.size + fore_x.size, h_bar),
        mode="lines",
        name=r"long-run $\bar h$",
        line=dict(dash="dash"),
    )
)

fig.update_layout(
    title="ARCH variance forecast (mean reversion toward long-run variance)",
    xaxis_title="time",
    yaxis_title="variance",
    height=450,
)
fig.show()

Notes / pitfalls#

  • ARCH models react to recent shocks but can require a large \(q\) to capture the long persistence seen in markets.

  • A frequent extension is GARCH, which adds lagged variance terms to get slow decay with few parameters.

  • Stationarity conditions above target finite variance; stricter stationarity conditions can be more nuanced.